Manifold Regularization for Kernelized LSTD

نویسندگان

  • Xinyan Yan
  • Krzysztof Choromanski
  • Byron Boots
  • Vikas Sindhwani
چکیده

Policy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spectral Methods for Linear and Non-Linear Semi-Supervised Dimensionality Reduction

We present a general framework of spectral methods for semi-supervised dimensionality reduction. Applying an approach called manifold regularization, our framework naturally generalizes existent supervised frameworks. Furthermore, by our two semi-supervised versions of the representer theorem, our framework can be kernelized as well. Using our framework, we give three examples of semi-supervise...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

Kernel Recursive Least-Squares Temporal Difference Algorithms with Sparsification and Regularization

By combining with sparse kernel methods, least-squares temporal difference (LSTD) algorithms can construct the feature dictionary automatically and obtain a better generalization ability. However, the previous kernel-based LSTD algorithms do not consider regularization and their sparsification processes are batch or offline, which hinder their widespread applications in online learning problems...

متن کامل

A Dantzig Selector Approach to Temporal Difference Learning

LSTD is a popular algorithm for value function approximation. Whenever the number of features is larger than the number of samples, it must be paired with some form of regularization. In particular, `1-regularization methods tend to perform feature selection by promoting sparsity, and thus, are wellsuited for high–dimensional problems. However, since LSTD is not a simple regression algorithm, b...

متن کامل

Sustainable ℓ2-regularized actor-critic based on recursive least-squares temporal difference learning

Least-squares temporal difference learning (LSTD) has been used mainly for improving the data efficiency of the critic in actor-critic (AC). However, convergence analysis of the resulted algorithms is difficult when policy is changing. In this paper, a new AC method is proposed based on LSTD under discount criterion. The method comprises two components as the contribution: (1) LSTD works in an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1710.05387  شماره 

صفحات  -

تاریخ انتشار 2017